2 



Competition-Induced Preferential Attachment 



N. Berger\ C. Borgs^ J. T. Chayes\ R. M. D'Souza\ and R. D. Kleinberg^ 

^ Microsoft Research, One Microsoft Way, Redmond WA 98052, USA 
M.I.T. CSAIL, 77 Massachusetts Ave, Cambridge MA 02139, USA. Supported by a Fannie and John Hertz 

Foundation FeUowship. 



Abstract. Models based on prelerential attachment have had much success in reproducing the power 
law degree distributions which seem ubiquitous in both natural and engineered systems. Here, rather 
than assuming preferential attachment, we give an explanation of how it can arise from a more basic 
underlying mechanism of competition between opposing forces. 

We introduce a family of one-dimensional geometric growth models, constructed iteratively by locally 
optimizing the tradeoffs between two competing metrics. This family admits an equivalent description 
as a graph process with no reference to the underlying geometry. Moreover, the resulting graph process 
is shown to be preferential attachment with an upper cutoff. We rigorously determine the degree 
distribution for the family of random graph models, showing that it obeys a power law up to a finite 
threshold and decays exponentially above this threshold. 

We also introduce and rigorously analyze a generalized version of our graph process, with two natural 
parameters, one corresponding to the cutoff and the other a "fertility" parameter. Limiting cases 
of this process include the standard Barabasi-Albert preferential attachment model and the uniform 
attachment model. In the general case, we prove that the process has a power law degree distribution 
up to a cutoff, and establish monotonicity of the power as a function of the two parameters. 

1 Introduction 

1.1 Network Growth Models 

There is currently tremendous interest in understanding the mathematical structure of networks - espe- 
cially as we discover how pervasive network structures are in natural and engineered systems. Much recent 
theoretical work has been motivated by measurements of real- world networks, indicating they have certain 
"scale-free" properties, such as a power-law distribution of degree sequences. For the Internet graph, in par- 
ticular, both the graph of routers and the graph of autonomous systems (AS) seem to obey power laws [14, 
15]. However, these observed power laws hold only for a limited range of degrees, presumably due to physical 
constraints and the finite size of the Internet. 

Many random network growth models have been proposed which give rise to power law degree distribu- 
tions. Most of these models rely on a small number of basic mechanisms, mainly preferential attachment^ 
[19,4] or copying [17], extending ideas known for many years [12,20,22,21] to a network context. Variants 



^ As Aldous [3] points out, proportional attachment may be a more appropriate name, stressing the linear dependence 
of the attractiveness on the degree. 



of the basic preferential attachment mechanism have also been proposed, and some of these lead to changes 
in the values of the exponents in the resulting power laws. For extensive reviews of work in this area, see 
Albert and Barabasi [2], Dorogovtsev and Mendes [11], and Newman [18]; for a survey of the rather limited 
amount of mathematical work see [7]. Most of this work concerns network models without reference to an 
underlying geometric space. Nor do most of these models allow for heterogeneity of nodes, or address physical 
constraints on the capacity of the nodes. Thus, while such models may be quite appropriate for geometry-free 
networks, such as the web graph, they do not seem to be ideally suited to the description of other observed 
networks, e.g., the Internet graph. 

In this paper, instead of assuming preferential attachment, we show that it can arise from a more basic 
underlying process, namely competition between opposing forces. The idea that power laws can arise from 
competing effects, modeled as the solution of optimization problems with complex objectives, was proposed 
originally by Carlson and Doyle [9] . Their "highly optimized tolerance" (HOT) framework has reliable design 
as a primary objective. Fabrikant, Koutsoupias and Papadimitriou (FKP) [13] introduce an elegant network 
growth model with such a mechanism, which they called "heuristically optimized trade-offs" . As in many 
growth models, the FKP network is grown one node at a time, with each new node choosing a previous node 
to which it connects. However, in contrast to the standard preferential attachment types of models, a key 
feature of the FKP model is the underlying geometry. The nodes are points chosen uniformly at random from 
some region, for example a unit square in the plane. The trade-off is between the geometric consideration 
that it is desirable to connect to a nearby point, and a networking consideration, that it is desirable to 
connect to a node that is "central" in the network as a graph. Centrality is measured by using, for example, 
the graph distance to the initial node. The model has a tunable, but fixed, parameter, which determines the 
relative weights given to the geometric distance and the graph distance. 

The suggestion that competition between two metrics could be an alternative to preferential attachment 
for generating power law degree distributions represents an important paradigm shift. Though FKP intro- 
duced this paradigm for network growth, and FKP networks have many interesting properties, the resulting 
distribution is not a power law in the standard sense [5] . Instead the overwhelming majority of the nodes are 
leaves (degree one), and a second substantial fraction, heavily connected "stars" (hubs), producing a node 
degree distribution which has clear bimodal features.** 

Here, instead of directly producing power laws as a consequence of competition between metrics, we show 
that such competition can give rise to the preferential attachment mechanism, which in turn gives rise to 
power laws. Moreover, the power laws we generate have an upper cutoff, which is more realistic in the context 
of many applications. 

1.2 Overview of Competition-Induced Preferential Attachment 

We begin by formulating a general competition model for network growth. Let xq,xi, . . . ,Xt be a sequence 
of random variables with values in some space A. We think of the points xq,xi, . . . ,Xt arriving one at a 

* In simulations of the FKP model, this can be clearly discerned by examining the probability distribution function 
(pdf); for the system sizes amenable to simulations, it is less prominent in the cumulative distribution function 
(cdf). 



time according to some stochastic process. For example, we typically take yl to be a compact subset of M*^, 
xo to be a given point, say the origin, and a;i, . . . , a;t to be i.i.d. uniform on A. The network at time t will 
be represented by a graph, G{t), on t + 1 vertices, labeled 0,1,. . . ,t, and at each time step, the new node 
attaches to one or several nodes in the existing network. For simplicity, here we assume that each new node 
connects to a single node, resulting in G{t) being a tree. 

Given G{t — 1), the new node, labeled t, attaches to that node j in the existing network that minimizes 
a certain cost function representing the trade-off of two competing effects, namely connection or startup 
cost, and routing or performance cost. The connection cost is represented by a metric, gij(t), on {0, ■ .-,1} 
which depends on a;o, . . . , Xt, but not on the current graph G{t — 1), while the routing cost is represented by 
a function, hj{t — 1), on the nodes which depends on the current graph, but not on the physical locations 
xq,. . . ,Xt of the nodes 0, . . . ,t. This leads to the cost function 

Ct = min [agtj{t) + hj{t - 1)] , (1) 

where a is a constant which determines the relative weighting between connection and routing costs. We 
think of the function hj{t — f ) as measuring the ccntrality of the node j; for simplicity, we take it to be the 
hop distance along the graph G{t — 1) from j to the root 0. 

When A is equipped with an appropriate norm || • ||, we can use a simplified algorithm, minimizing the 
cost only over those points j which are closer to the root than is the new point: 

= •„ T^i >9tj{t) + hj{t-l)]. (2) 

In the original FKP model, ^ is a compact subset of M^, say the unit square, and the points Xi are indepen- 
dently uniformly distributed on A. The cost function is of the form (1), with gij = dij, the Euclidean metric 
(modeling the cost of building the physical transmission line) , and hj (t) is the hop distance along the existing 
network G{t) from j to the root. A rigorous analysis of the degree distribution of this two-dimensional model 
was given in [5], and the analogous one-dimensional problem was treated in [16]. 
Our model is defined as follows. 

Definition 1 ( Border Toll Optimization Process) Let Xq = 0, and let X\,X2, ... he i.i.d.. uniformly 
at random in the unit interval A = [0, 1], and let G{t) he the following process: At t = 0, G{t) consists of a 
.single vertex 0, the root. Let hj{t) be the hop distance to along G{t), and let gij{t) ~ nij{t) he the numher 
of existing nodes between Xi and Xj at time t, which we refer to as the jump cost ofi connecting to j. Given 
G{t — 1) at time t — 1, a new vertex, labeled t, attaches to the node j which minimizes the cost function 
(2) . Furthermore, if there are several nodes j that minimize this cost function and satisfy the constraint, we 
choose the one whose position Xj is nearest to Xt ■ The process so defined is called the border toll optimization 
process (BTOP). 

As in the FKP model, the routing cost is just the hop distance to the root along the existing network. 
However, in our model the connection cost metric measures the number of "borders" between two nodes: 
hence the name BTOP. Note the correspondence to the Internet, where the principal connection cost is related 
to the number of AS domains crossed - representing, e.g., the overhead associated with BGP, monetary costs 



of peering agreements, etc. In order to facilitate a rigorous analysis of our model, we took the simpler cost 
function (2), so that the new node always attaches to a node to its left. 

It is interesting to note that the ratio of the BTOP connection cost metric to that of the one- dimensional 
FKP model is just the local density of nodes: rnj/dij = pij . Thus the transformation between the two models 

is equivalent to replacing the constant parameter a in the FKP model with a variable parameter a^- = apij 
which changes as the network evolves in time. That aij is proportional to the local density of nodes in the 
network reflects a model with an increase in cost for local resources that arc scarce or in high demand. 
Alternatively, it can be thought of as reflecting the economic advantages of being first to market. 

Somewhat surprisingly, the BTOP is equivalent to a special case of the following process, which closely 
parallels the preferential attachment model and makes no reference to any underlying geometry. 

Definition 2 (Generalized Preferential Attachment with Fertility and Aging) Let Ai,A2 he two 
positive integer-valued parameters. Let G(t) be the following Markov process, whose states are finite rooted 
trees in which each node is labeled either fertile or infertile. At time t = 0, G{t) consists of a single fertile 
vertex. Given the graph at time t, the new graph is formed in two steps: first, a new vertex, labeled t+1 and 
initialized as infertile, connects to an old vertex j with probability zero if j is infertile, and with probability 

Prit + l^j) = ^^^^^^^^ (3) 

if j is fertile. Here, dj{t) is equal to 1 plus the out-degree of j , and W{t) = Yl'j min{(ij(i), A2}, where the sum 
is only over fertile vertices. Second, if after the first step, j has more than Ai — 1 infertile children, one of 
them, chosen uniformly at random, becomes fertile. The process so defined is called a generalized preferential 
attachment process with fertility threshold Ai and aging threshold A2. The special case Ai = A2 is called 
the competition-induced preferential attachment process with parameter Ai . 

The last definition is motivated by the following theorem, to be proved in Section 2. 

Theorem 1 The border toll optimization process is equivalent to a the competition-induced preferential 
attachment process with parameter A = \a~^'\. 

Certain other limiting cases of the generalized preferential attachment process are worth noting. If = 1 
and A2 = 00, we recover the standard Barabasi- Albert model of preferential attachment. If = 1 and A2 is 
finite, the model is equivalent to the standard model of preferential attachment with a cutoff. On the other 
hand, if Ai = A2 = 1, we get a uniform attachment model. 

The degree distribution of our random trees is characterized by the following theorem, which asserts that 
almost surely (a.s.) the fraction of vertices having degree k converges to a specified limit qk, and moreover 
that this limit obeys a power law for k < A2, and decays exponentially above A2. 

Theorem 2 Let Ai, A2 be positive integers and let G{t) be the generalized preferential attachment process 
with fertility parameter A\ and aging parameter A2. Let No{t) be the number of infertile vertices at time t, 
and let Nk{t) be the number of fertile vertices with k — 1 children at time t, k>l. Then: 



1. There are numbers qk G [0, 1] such that, for all k >0 

—— > Qk a.s., as t ^ oo. (4) 

2. There exists a number w G [0, 2] such that the qk are determined by the following equations: 

^ qi if i< A2, (5) 




A, ^'-^^ 



00 CXD 

l = ^gi, and qQ = y^qi mm{i -l,Ai- 1}. 

5. There are positive constants ci and C'l, independent of Ai and A2, such that 

cik-^^+^^ < Qk/qi < (7) 

for\<k< A2. 

4- If Ai = A2, the parameter w is equal to 1, and for general Ai and A2, it decreases with increasing Ai, 
and increases with increasing A2 . 

Equation (7) clearly defines a power law degrcx; distribution with exponent 7 = ^ + 1 for k < A2- Note 
that for measurements of the Internet the value of the exponent for the power law is 7 w 2. In our border 
toll optimization model, where Ai = A2, we recover 7 = 2. 

The convergence claim of Theorem 2 is proved using a novel method which we believe is one of the 
main technical contributions of this paper. For preferential attachment models which have been analyzed 
in the past [1,6,8,10], the convergence was established using the Azuma-Hoeffding martingale inequality. 
To establish the bounded-differences hypothesis required by that inequality, those proofs employed a clever 
coupling of the random decisions made by the various edges, such that the decisions made by an edge e only 
influence the decisions of subsequent edges who choose to imitate e's choices. A consequence of this coupling 
is that if e made a different decision, it would alter the degrees of only finitely many vertices. This in turn 
allows the required bounded-differences hypothesis to be established. No such approach is available for our 
models, because the coupling fails. The random decisions made by an edge e may influence the time at which 
some node v crosses the fertility or attractiveness threshold, which thereby exerts a subtle influence on the 
decisions of every future edge, not only those who choose to imitate e. 

Instead we introduce a new approach based on the second moment method. The argument establishing the 
requisite second-moment upper bound is quite subtle; it depends on a computation involving the eigenvalues 
of a matrix describing the evolution of the degree sequence in a continuous-time version of the model. The 
key observation is that, in this continuous-time model, the expected number of vertices of each degree grows 
exponentially at a rate determined by the largest eigenvalue, w, of this matrix, while the variance of the 
number of vertices of each degree has an exponential growth rate which is at most the second eigenvalue. 
For the matrix in question, the top eigenvalue has multiplicity 1, thus ensuring that the variance grows more 
slowly than the mean. We then translate this continuous-time result into a rigorous convergence result for 
the original discrete-time system. 




Fig. 1. A sample instance of BTOP for A = 3, showing the process on the unit interval (on the left), and the resulting 
tree (on the right). Fertile vertices are marked red, infertile ones are marked white. Note that vertex 1 became fertile 
at t = 3. 

2 Equivalence of the two models 

2.1 Basic properties of the border toll optimization process 

In this section we will turn to the BTOP defined in the introduction, establishing some basic properties 
which will enable us to prove that it is equivalent to the competition-induced preferential attachment model. 
In order to avoid complications we exclude the case that some of the cc^'s are identical, an event that has 
probability zero. We say that j e {0,1 ... ,t} lies to the right of i G {0, 1 . . . , i} if a;i < xj, and we say that 
j lies directly to the right of i il Xi < Xj but there is no fc G {1, . . . ,t} such that Xi < Xk < Xj. In a similar 
way, we say that j is the first vertex with a certain property to the right of i if j has that property and there 
exists no fc G {1, . . . ,t} such that Xi < Xk < Xj and k has the property in question. 

Definition 3 A vertex i is called fertile at time t if a new point that arrives at time t + 1 and lands directly 
to the right of Xi attaches itself to the node i. Otherwise i is called infertile at time t. 



This definition is illustrated in Fig. 1. 



Lemma 1. Let < a < oo, let A= \oi~^~\, and let <t < oo. Then 

i) The node is fertile at time t. 

ii) Let i he fertile at time t. If i is the right most fertile vertex at time t (case 1), let I he the numher of 
infertile vertices to the right ofi. Otherwise (case 2), let j he the next fertile vertex to the right ofi, and let 

£ = nij{t). Then < £ < A — 1, and the i infertile vertices located directly to the right of i are children of i. 
In case 2, if hj > hi, then j is a fertile child ofi and £ = A — 1. As a consequence, the hop count between 
two consecutive fertile vertices never increases by m,ore than 1 as we move to the right, and if it increases 
by 1, there are A — 1 infertile vertices between the two fertile ones. 

Hi) Assume that the new vertex at time t + 1 lands between two consecutive fertile vertices i and j, and 
let £ = nij{t). Then t+1 becomes a child of i. If £ + 1 < A, the new vertex is infertile at time t+1, and 
the fertility of all old vertices is unchanged. If £ + 1 = A and the new vertex lies directly to the left of j, the 
new vertex is fertile at time t + \ and the fertility of the old vertices is unchanged. If £ + I = A and the new 
vertex lies not directly to the left of j , the new vertex is infertile at time t+1, the vertex directly to the left 
of j becomes fertile, and the fertility of all other vertices is unchanged. 

iv) Ift + 1 lands to the right of the right most fertile vertex at time t, the statements in Hi) hold with j 
replaced by the right endpoint of the interval [0, 1], and nij{t) replaced by the number of vertices to the right 
ofi. 

v) If i is fertile at time t, it is still fertile at time t+1. 

vi) If i has k children at time t, the £ = min{^ — 1, ^} left most of them are infertile at time t, and any 
others are fertile. 

Proof. Statement i) is trivial, and statements v) and vi) follow immediately from iii) and iv), so we are left 
with ii) — iv). We proceed by induction on t. If ii) holds at time t, and iii)and iv) hold for a new vertex 
arriving at time t + 1, ii) clearly also holds at time t + 1. Wc therefore only have to prove that ii) at time t 
implies iii) and iv) for a new vertex arriving at time t+1. Using, in particular, the last statement of ii) as 
a key ingredient, the proof is straightforward but lengthy. It is not worth reproducing here. The interested 
reader can find it in Appendix A. 

2.2 Proof of Theorem 1 

In the BTOP, note that our cost function 

mmi[antj{t) + hj{t-l)], (8) 

and hence the graph G{t), only depends on the order of the vertices xq,. . . ,Xt, and not on their actual 
positions in the interval [0, 1]. Let TT{t) be the permutation of {0, 1, ... , t} which orders the vertices Xq,. . . ,Xt 
from left to right, so that 

= a;^o(t) < •^7ri(t) < • • • < ■^irt(t)- (9) 

(Recall that the vertices a;o, xi, . . . ,xt are pairwise distinct with probability one.) We can consider a change 
of variables, from the x's to the length of the intervals between successive ordered vertices: 

Si{t) = x„,^^(^t) - x^iit) if 0<i<t-l and st{t) = 1 - x^^^t)- (10) 



The lengths then obey the constraint: X]*^q Sj = 1. The set of interval lengths, s(t) together with the set 
of permutation labels TT{t) = {Tro{t),ni{t), . . . ,nt{t)) is an equivalent representation to the original set of 
position variables, x{t). 

Let us consider the process {■K{t)}t>i. It is not hard to show that this process is a Markov process, with 
the initial permutation being the trivial permutation given by 7ri(l) = i, and the permutation at time t + 1 
obtained from 7r(t) by inserting the new point t + 1 into a uniformly random position. More explicitly, the 
new permutation Tz{t + 1) is obtained from 7r{t) by choosing G {1,. . . ,t + 1} uniformly at random, and 
setting 





\ni{t) if i<io 




Mt + i) = 1 


: t + 1 if i ^ io 


(11) 




[Tri-i{t) if i > io- 





Indeed, let Ik{t) = [.Tjr^,(t), and consider for a moment the process (7r(t), s(t)). Then the conditional 

probability that the next point arrives in the k-th interval, Ik, depends only on the interval length at time t: 

Pr [xt+i e Ik \n{t), s{t),TT{t - 1), s{t - 1), ... , 7r(0), s(0) ] 

= Pr [xt+i e Ik \n{t),s{t)] = Skit). (12) 

Integrating out the dependence on the interval length from the above equation we get: 

Pr [xt+i G Ik \TT{t)] = J Pr [xt+i e Ik |7r(i), s{t)]dP{s{t)) 

= I Skit)dP{s{t)) = j^, (13) 

since after the arrival of t points, there exist {t + 1) intervals. The probabihty that the next point arrive 
in the fc-th interval is uniform over all the intervals, proving that TT{t) is indeed a Markov chain with the 
transition probabilities described above. 

With the help of Lemma 1, we now easily derive a description of the graph G{t) which does not involve 
any optimization problem. To this end, let us consider a vertex i with i infertile children at time t. If a new 
vertex falls into the interval directly to the right of i, or into one of the intervals directly to the right of 
an infertile child of i, it will connect to the vertex i. Since there is a total of t + 1 intervals at time t, the 
probability that a vertex i with i infertile children grows an offspring is {i + l)/{t+ 1). By Lemma 1 (vi), 
this number is equal to mm{A, ki} / {t + 1), where fc, — 1 is the number of children of i. Note that fertile 
children don't contribute to this probability, since vertices falling into an interval directly to the right of a 
fertile child will connect to the child, not the parent. 

Assume now that i did get a new offspring, and that it had A — 1 infertile children at time t. Then the 
new vertex is either born fertile, or makes one of its infertile siblings fertile. Using the principle of deferred 
decisions, we may assume that with probability 1/A the new vertex becomes fertile, and with probability 
{A — 1)/A an old one, chosen uniformly at random among the A — 1 candidates, becomes fertile. 

We thus have shown that the solution G{t) of the optimization problem (8) can alternatively be described 
by the competition-induced preferential attachment model with parameter A. 



3 Convergence of the Degree Distribution 



3.1 Overview 

To characterize the behavior of the degree distribution, we wiU derive a recursion which governs the evolution 
of the expected number of vertices of each degree, at the time when there are r nodes in the network. The 
coefBcients of this recursion are random variables depending on W{t), the combined attractiveness of all 
vertices at time r. To simplify the analysis of the recursion, we introduce a continuous-time model which is 
equivalent to the original discrete-time model up to a (random) reparametrization of the time coordinate. The 
kernel of this continuous-time Markov chain is a matrix M whose coefficients we identify explicitly. In this 
section we will prove that the expected degree distribution converges to a scalar multiple of the eigenvector 
p of M associated with the largest eigenvalue w. The much more difficult proof that the empirical degree 
distribution converges a.s. to the same limit is deferred to Appendix B. 

3.2 Notation 

Let A > max(Ai, ^2)- At (discrete) time r, let No{t) be the number of infertile vertices at time r, and, for 
fc > 1, let A^fe(r) be the number of fertile vertices with k — 1 children at time r. Let Na{t) = N>a{t) = 
T,k>A^k{T), and iVfc(r) = A^fc(r) if k < A. Finally let Nk{r) = 7^iVfe(r), and let nk{T),nk{T) to be the 
expected values of Nk{T), Nk{T), respectively. 

3.3 Evolution of the expected value 

Prom the definition of the generalized preferential attachment model, it is easy to derive the probabilities 
for the various alternatives which may happen upon the arrival of the (r + l)-st node: 

— With probability A2Na{t) /W{t), it attaches to a node of degree > A. This increments Ni, and leaves 
Na and all Nj with 1 < j < A unchanged. 

— With probability iaim{A2,k)Nk{T)/W{T), it attaches to a node of degree k, where 1 < k < A. This 
increments Nk+i, decrements Nk, increments No or Ni depending on whether k < Ai or k > Ai, and 
leaves all other Nj with j < A unchanged. 

It follows that the discrete-time process {A^fe(''')}^o equivalent to the state of the following continuous-time 
stochastic process (with time parameter t) at the time of the r-th event. 

— With rate A2NA{t), Ni increases by 1. 

— For every < k < A, with rate Nk{t) min(fc, A2), the following happens: 

Nk^Nk-l ; Nk+i ^ Nk+i + 1 ; Ng^^) ^ ^g(fc) + 1 
where g{k) = for fc < and g{k) = 1 otherwise. 



Let M be the following A x A matrix: 

' - miii(j, A2) if l<i^j<A-l 
mm{j, A2) if 1 < j = j + 1 < A 
'■^ min(j, ^42) if j — I and i > Ai 
otherwise. 

v 

Then, for the continuous time process, for every t > s, the conditional expectations of the vector N{t) are 
given by 

E(^N{t)\N{.s)^ =e'^^-'^^N{.s). (14) 

It is easy to see that the matrix e'^^ has all positive entries, and therefore (by the Perron- Frobenius Theorem) 
M has a unique eigenvector p of ^i-norm 1, having all positive entries. Let w be the eigenvalue corresponding 
to p. Then w is real, it has multiplicity 1, and it exceeds the real part of every other eigenvalue. Therefore, 
for every non-zero vector n with non- negative entries, 

lim e-*"'e*^n= (a,n)p 

t—*OG 

where a is the eigenvector of corresponding to w. Note that (a, n) > because n is non-zero and 
non-negative, and a is positive, again by Perron- Frobenius. Therefore, up to a scalar factor, the vector 
n(t) := E {e~^^N{t)^ converges to p as f ^ 00. Note that this implies, in particular, that w > 0. We can 
also show that w < 2. This is a consequence of Claim 2 in Appendix B, which says that Nk{t) is stochastically 
dominated by a stochastic process Xt satisfying E{Xt) ~ e^*. 

To conclude that the discrete time version, n(r), converges to p as well, one needs show that, a.s., r is 
finite for all finite t. This is done in Claims 1 and 2 in Appendix B. 



4 Power law with a cutoff 

In the previous section, we saw that for every A > max{Ai, A2}, the limiting proportions up to ^ — 1 are p 
where p is the eigenvector corresponding to the highest eigenvalue w of the A-hy-A matrix 



Mi, 



- min(j, A2) if 1 <i = j < A - 1 
min(j, A2) if l<i=j + l<A 
min(j, A2) if j = 1 and j > Ai 
otherwise. 



(15) 



Therefore, the proportions p satisfy the equation: 

wpi = - min(i, ^2)Pi + min(j - 1, ^2)Pi-i i > 2 (16) 
where the normalization is determined by X^^i Pi = 1. From (16) we get that for i < A2, 



and for i > A2 

i-A2 



Clearly, (18) is exponentially decaying. There are many ways to see that (17) behaves like a power-law with 
degree 1 + w. The simplest would probably be: 



+ 1 



= exp w) l^Y. fc"' j + 0(1) j = exp l^i-l - w) \^Jog ( ^ ) ) + 0(1) 

= exp ((-1 - w) log(«/2) + 0(1)) = 0(l)i-^-'" 



Note that the constants implicit in the O(-) symbols do not depend on Ai, A2 or i, due the fact that 
< w <2. (19) can be stated in the following way: 

Proposition 3 There exist < c < C < 00 such that for every Ai, A2 and i < A2, if w = w{Ai,A2) is as 
in (16), then 

< ^ < Cr^"". (20) 
Pi 

The vector q — {qi,q2, . . .) is a scalar multiple of p, so equations (5), (6), and (7) in Theorem 2 (and the 
comment immediately following it) arc consequences of equations (17), (18), and (20) derived above. It 
remains to prove the normalization conditions 



^9i = l; qo='^qimm{i-l,Ai-l) 



1=0 i=l 

stated in Theorem 2. These follow from the equations 

00 00 

^iV,(i) = t + 1; No{t)=Y^ Ni{t) min(z - 1, ^1 - 1). 

i=0 i=l 

The first of these simply says that there are t+1 vertices at time t; the second equation is proved by counting 
the number of infertile children of each fertile node. 

The monotonicity properties of w asserted in part 4 of Theorem 2 are proved in Appendix C. The 
concentration of the empirical degree distribution is proved in Appendix B. 
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A Proof of Lemma 1 



In this appendix, wc complete the proof of Lemma 1. 

To this end, let us first reeall that the only non-trivial part is the fact that ii) at time t implies iii) and 
iv) for a new vertex arriving at time t + 1. Assume thus that ii) holds at time t. 

At time t + 1, a new vertex arrives, and falls directly to the right of some vertex k. Let i be the nearest 
vertex to the left of k that was fertile at time t (if k is fertile at time t, we set i = k) and let j be the nearest 
vertex to the right of i that was fertile at time t (we assume for the moment that i is not the right most 
fertile vertex at time t), let i be the number of vertices between i and j at time t. 

Let us first prove that the vertex t + 1 connects to i. If i = k, this is obvious, since i is fertile at time 
t. We may therefore assume that k ^ i. For the new vertex t + 1, the cost of connecting to the vertex i is 
then equal to a{nik{t) + 1). Let us first compare this cost to the cost of connecting to a fertile vertex i' to 
the left of i. Let io = let is = i, and let ii, . . . , is-i be the fertile vertices between i' and i, ordered from 
left to right. If hi^_^ < hi^, we use the inductive assumption ii) to conclude that the number of infertile 
vertices between im-i and im is equal to A — 1, and hi^_^ = hi^ — 1. A decrease of q in the hop cost is 
therefore accompanied by an increase in the jump cost of at least aAq > q. As a. consequence, it never pays 
to connect to a fertile vertex i' to the left of i. The cost of connecting to an infertile vertex to the left of i 
is even higher, since the hop count of an infertile vertex is at best equal to the hop count of the next fertile 
vertex to the right. We therefore only have to consider the connection cost to some of the infertile children 
of i. But again, the hop count is worse by 1 when compared to the hop count of z, and the jump cost is at 
best reduced by (A — l)a < 1, proving that the cost of connecting to i is minimal. 

To discuss the fertility of the vertices in the graph G{t + 1), we need to consider the arrival of a second 
vertex, labeled t + 2. lft + 2 falls to the left oft + 1, it will face an optimization problem that has not been 
changed by the arrival of the vertex t + 1, implying that the fertility of the vertices to the left of t + 1 is 
unchanged. If t + 2 falls to the right of j, the cost of connecting to j or one of the vertices to the right of 
j is the same as before, and the cost of connecting to a vertex to the left of j is at best equal (the cost of 
connecting to any vertex to the left of t + 1 is in fact higher, due to the additional cost of jumping over the 
vertex t + 1). Therefore, the vertex t + 2 will still prefer to connect to either j or one of the vertices to the 
right of j, implying that the fertility of the vertices to the right of j has not changed at all. We therefore are 
left with analyzing the case where t + 2 falls between t + 1 and j. Again, the vertex t + 2 will prefer i over 
any vertex to the left of i (the cost analysis is the same as the one used for t + 1 above), so we just have to 
compare the costs of connecting to the different vertices between i and j. If ^ + 1 < A, this will again imply 
that t + 2 connect to i; but if ^ + 1 = A, the vertex t + 2 will only connect to i if it does not fall to the right 
of the right most of the now £ + 1 vertices between i and j. If it falls to the right of this vertex, it will be 
as expensive to connect to the right most of the now £+1 vertices between i and j as it is to connect to i. 
Recalling out convention of connecting to the nearest vertex to the left if there is a tie in costs, this proves 
that now t + 2 connects to the right most vertex between i and j, implying that this vertex is fertile. 

The above considerations prove the fertility statements in iii), and thus completes the proof of iii). The 
case where i is the right most fertile vertex at time t is similar (in fact, it is slightly easier since it involves 
less cases), and leads to the proof of iv). This completes the proof of Lemma 1 



B Concentration of Nk{t) 



B.l Concentration of the continuous time process 

In order to show concentration of the continuous time process, we will prove the following two lemmas: 
Lemma 2. For every u < w and every 1 < k < A, a.s. for every t large enough, 

Nk{t) > e"*. 

and 

Lemma 3. There exists v < w s.t. for every 1 < k < j < A a.s. for every t large enough, 

In order to prove Lemmas 2 and 3, we need to use some estimates considering the standard birth process 
described below. 

Definition 4 Let {on}^=i be independent exponential random, variables, so that E(o„) = ^n~^ . For t S 
[0, oo), let Xt = inf{n : Ofc > t}. Then X is called the standard birth process. 

The following claim will be proved in Appendix D. 

Claim 1 Xt is almost surely finite for every t. Furthermore, there exists a constant Cg such that for every 
t2 > ti and X and k, 

P(x,>fexe^(--)|X,=.)<-^. (21) 

The standard birth process is connected to our discussion through the following easy claim: 

Claim 2 Let \\N{t)\\ = Y.k=\^k{t)- Let T > 0, let x > y, and let X be a standard birth process. Then 
{{Xt}t>T\XT = x} stochastically dominates \^{\\N{t)\\}t>T \\N{T)\\ =y|. 

Proof. Let {rn}^=i be i.i.d. exponential random variables with mean 1. Then J2k=i "fe same distri- 

bution as J2k=i '''k/^k. The time at which the n-th node is born has the same distribution as J2k=i i"k/W{k), 
where W{k) denotes the combined attractiveness of all nodes at time k. The claim follows now from the 
observation that W{k) < 2k. 

Proof (Proof of Lemma 3). We use a martingale to bound the variance. Fix T, and let 

it = E (^pjNk{T)-pkNAT)\ N{t)) 
Clearly, Lt is a (continuous time) martingale. Let b = b^^''^^ be the vector 

-Pk iii=j 
bi = i Pj if i = k 

otherwise. 

By (14), we know that Lt = 6^e^^^~*^n(f). Jp = 0, and therefore the norm of Je^^-^"*) is bounded by 
e('^-t)v for some v' < w. Without loss of generality, we may assume that v' > w/2. 



Claim 3 

var (pjNk{T) - PkNjiT)) < Cexp(2t;'T) (22) 

For some constant C. 

Proof. Let < e < exp(— lOT) be such that K = T/e is an integer number. Then, {Uk = Lke}k=o ^ 
martingale, and 

K-l 

var (pjNk{T)-pkNj{T)') = ^ v&v{Uk+i - Uk) 

We want to estimate the variance of {Uk+i — Uk)- Let Vk = ||^(fe+i)e — ^fee||- Clearly, 

var(f/fc+i - Uk) < yav{vk) exp [2v'{T - {k + l)e)] 

Using Claims 1 and 2, 

var(ufc) = E (var (vk Nke ) ) + var (e (vk Nke ) ) 

< exp(u;A:e) (e^" - l) + exp(4A:e) (e^" - l)^ 

< 5eexp{wke) + 4e^ exp(4A:e) < Coeexp(ti;fce) 

for Co = 6, by the choice of e. Therefore, 



K-l 



var (pjNk{T) - PfciVj(T)) < Cqc ^ exp {wke + 2v'{T - {k + l)e)) 

fc=o 

< Coe"" ^ / e^^-^" < Cexp(2t>'r) 

JO 

for 

C = Co e("'-2''')*dt < oo. 
Jo 

Choose some v strictly between v' and tz; in a way that w—v < 0.25 min(0.1, v—v') and let 6 = min(0.1, t;— t;'). 
Using Chebyshev's inequality, 

1 vt\ ^ r^„-2ST 



P [pjNkiT) - PkNj{T) > -e"^ j < Ce-''' (23) 

Let {T'i}i=i^2,... be such that e^''^* = i^. By Borel-Cantelli, almost surely there exists iq such that for all 

i > io, 

PjNkiTi) - PkNjiTi) < ie"^- (24) 
We want to show that almost surely for all T large enough, 

PjNkiT) -pkNj{T)<e^^ (25) 

We know that E{N{Ti)) = 0{eiij>(wTi)), and using a martingale argument similar to the one in Claim 3, we 
get that var(Ar(ri)) = 0(exp(2w;Ti)) and therefore 

P{N{Ti) > e(-+o-6*)^^) < Qe-i-2^^> = Qr^-^ 



for some constant C;, and therefore, if m{i) is the number of moves between Tj and Tj+i, then 
P (m{i) > ^e"^^ 
< P (iV(TO > e'"+(0-<^*)'^') + P (^m{i) > ^e^^' 

where the last inequaUty uses Claim 1 and the fact that 



(26) 



l^vT. ^ 2e(-+0-^^)^'(exp(2(r.+i - T,)) - 1) 



Using Borel-Cantelli, we conclude that almost surely, 



fe=i 



vTi 



(27) 



for all k and all i large enough and all T between Tj and T^+i. (25) follows from (27). 



Proof (Proof of Lemma 2). Using the same martingale argument as above, we can conclude that va,Y{NA{t)\NA{0) = 
1) < Cie^"'*, while E{NA{t)\NA{0) = 1) > Cze'"*. Therefore there exists p>0 such that 

P (^NA{t) > pe^'\NA{0) = l) > p. (28) 

Fix some large T, and let ti = iT. Then using (28) and independence, 

P (^NAiU) > ^e™^ NA{ti-i)^ > 1 - e-A^^(*.-i) (29) 

where (29) was obtained using Chernoff 's bound. Prom (29), we get that almost surely, for all i large enough. 



NA{ti) > exp ( i 
NA{t) is monotone increasing, and therefore 

NA{t) >Cexplt 



wT + log 



w — — log ( — 



(30) 



for all t large enough. Using Lemma 3 we conclude that 



Nk{t) > Cexp [^t 
for all k and large enough t. 

Proposition 4 For every k and j, almost surely 

Nk{t) Pk 
lim — = — 

t^o° Nj{t) Pj 

Proof. This follows immediately from Lemma 2 and Lemma 3. 



> e" 



(31) 



B.2 Back to discrete time 



Proposition 5 For the discrete time process, and A > m.ax{Ai,A2} there exists a vector q such that, for 
k < A, we have 

lim = qk. (32) 

r->(X) r + 1 

Proof. The i-th newcomer is of degree zero with probability 

However, by (31), this expression tends to a limit, and therefore, using the law of large numbers, 

1-^=^0 = ^1^ (33) 



T— *CX) 



Using (31) once more, the proposition now follows for A; > 1 with q/. = [1 — go)Pfe- 

Note that the above proposition implies that qk and hence pk is independent of A if ^ > A;, since the left 
hand side of (32) does not depend on A if ^ > fc. So, in particular, pi does not depend on A. 



C Monotonicity properties of w 

In this section we will prove that the exponent 1 + w of the power law in Proposition 3 is monotonically 
decreasing in Ai and monotonically increasing in A2. For this purpose, it will be useful to define a family of 
matrices, parametrized by two vectors y,z e M", which generalizes the matrix M appearing in (15), whose 
top eigenvalue is w. 

Given vectors y = (yi, t/2, • • • , Vn), z = {zi,Z2, - ■ ■ , Zn) € M", let M(y, z) denote the n-by-n matrix whose 
(ij)-th entry is: 

-Vj if i=3 
Vj if « = j + 1 
Zj if i = 1 and j > 1 
otherwise. 



Mij {y,z) = < 



Thus, for instance, the matrix M defined in (15) is M(y, z), where 

y= (1,2,..., ^2- 1,^2,^2, ^2,..., ^2) 

z = (0, 0, . . . , 0, min(Ai, A2), min(Ai + 1, A2), . . . , ^2, A2) 

For the remainder of this section, we will assume: 



t/j > for 1 < i < n, 
> for 1 < i < n, 

Zn>0. 



(34) 
(35) 
(36) 



All of these criteria will be satisfied by the matrices M(y, z) which arise in proving the desired monotonicity 
claim. It follows from (34), (35), and (36) that if we add a suitably large scalar multiple of the identity matrix 
to M(y, z), we obtain an irreducible matrix M(y, z) + BI with non-negative entries. The Perron- Probenius 
Theorem guarantees that M(y, z) -|- BI has a positive real eigenvalue R of multiplicity 1, such that all 
other complex eigenvalues have modulus < R; consequently M{y, z) has a real eigenvalue w = R — B, of 
multiplicity 1, such that the real part of every other eigenvalue is strictly less than w. 

We will study how w varies under perturbations of the parameters y, z. Let P(A, y, z) be the characteristic 
polynomial of M(y,z), i.e. 

P(A,y,z) =det(A7-M(y,z)). 

This is a polynomial of degree n in A (with coefficients depending smoothly on y, z), whose largest real root 
w{y,z) exists and has multiplicity 1, provided (y,z) belongs to the region 1/ C M" x M" determined by 
(34), (35), and (36). It follows from the Implicit Function Theorem that w(y, z) is a smooth function of (y, z) 
in V, satisfying: 

/ f)P rim flP\ 

(37) 



dP dw dP 

dyi dyi dw 



If X is any vector in M" x M", and 9x is the corresponding directional derivative operator, we have from (37): 

axP(w,y,z) 

d^w{y, z) = - . (38) 

(ap/9w)l(^,y,2) 

We know that (9P/9w;)|(tu^y,z) > because P is a polynomial with positive leading coefficient, w is its largest 
real root, and w has multiplicity 1. Thus we've established: 

Claim 4 For any vector x <E K" x E", and any (y, z) e V , put w = w{y, z). Then the directional derivatives 
d:xi'w{y, z) and dy^P{w, y, z) have opposite signs. 

This allows monotonicity properties of w to be deduced from calculations involving directional derivatives 
of P. Given the definition of M(y, z), it is straightforward to compute that 

n n 

P(A,y,z) = det(A/-M(y,z)) = l[{X + yi) - ^P,(A,y,z), (39) 

i=l j=2 

where 

Pj (A, y, z) = y}j z, I jj] (A + j/,) | . (40) 

The following three lemmas encapsulate the requisite directional derivative estimates. 

Lemma 4. {dP/dzk)\(w,y,z) < for (y,z) e V. 

Proof. 

dp/dzk = -dPk/dzk = - ( n ( n + y^n < o- 



Corollary 6 w is monotonically decreasing in Ai. 



Proof. Increasing Ax from k to fc+l has no effect on y, and its only effect on z is to decrease Zk from min(A;, A2) 
to 0. As we move in the —Zk direction, the directional derivative of P is positive, so the directional derivative 
of w is negative by Claim 4. Thus w decreases as we increase Ai from k to k + 1. 

Lemma 5. {dP/dyk)\(w,y,v.) < z/ (y,z) G V and Zk = 0. 
Proof. 



dP _ d 
dyk dyk 



.i=l 



k-l 



j=2 



^ n ^ k—1 ^ n 

<^—\\{w+yi) — —ypj — — y 



w + ykfj^ 
P{w, y,z) 
w + Vk 



Pi 



= 



Lemma 6. {dP/dyk + dP/dzk)\(w,y,z.) <0 if (y, z) eV and yk = Zk- 
Proof. 



dP dP _ d 
dyk dzk dyk 



.i=l 



dyk dzk 



k-l 



I ^ ^ 1 1 



j=2 



3=k+l 



k-l 



< t\{.w + yi) ^ V 

^ -P(w,y,z) 
w^yk 



P. 



w + yk^^ 



n 



1 



w + Vk 



Pk 



= 

Corollary 7 w is monotonically increasing in A2. 
Proof. If we change A2 from A; to + 1, this changes 

y = {l,2,...,k-l,k,k,...,k) 

into 



and it changes 



y' = (l,2,...,/e-l,/e,fc+l,...,fc + l) 
z = (0, 0, . . . , 0,min(^i, fc), min(Ai + l,k), . . . ,k, k) 



into 

z' = (0,0, .. .,0,min(^i,fc + l),min(Ai + l,k + 1), . . . ,k + l,k + 1). 

Letting e^^^ denote a unit vector in the +yj direction, and e^^' a unit vector in the +Zj direction, the 
direction of change is expressed by the vector 

x=(y',z')-(y,z)= Yl E (4'' +4'^) 

k+l<j<A2 max(fe+l,A2)<j 

and is negative, by the preceding two lemmas. By Claim 4, this means w increases monotonically as we 
move along this path. 

D Proof of Claim 1 

To see the finiteness of Xt, we need to show that X^^i o„ = cx) a.s. But this follows easily from the fact 
that J2'^=i = To see (21), we use the following argument, 

The standard birth process is equivalent to the following process: Start with one cell at time 0. At each 
time, every cell multiplies with rate 2. Xt is the number of cells at time t. 

Lemma 7 (Joel Spencer). For every t > and every positive integer k, E{X^) < oo. 

Proof. Let T = {V, E) be an infinite rooted binary tree, and let {wejeeB be i.i.d. exponential variables with 
expected value 0.5. Then, Xt is dominated by the size of 

Yt = \v&V : ^ We < t 

where ^{v) is the path from the root to v. For v of depth n, the probability that w is in 1^ is 

P(Poisson(2t) >n)=o (((n/2)!)"^) 

Therefore, 

oo 

P(|^t| >h)< 2"P(Poisson(2i) > n) < C{t)h-^°^^ 

and therefore all moments of Xt are finite. 

Claim 1 will follow from Chebyshef if we show that 

E(Xt,|X,J=Xi,0(e2(*=-*^)) (41) 

and 

var(Xt,|XiJ = Xt,0 (e^(*=-*^)) (42) 

for t2 > ti. To show (41) and (42), it is enough to show that E{Xt) = 0(6^*) and var(Xt) = 0(e**). 
^{Xt) = 0(e^*) follows because f{t) = ^{Xt) satisfies the differential equation 

var(Xt) = 0(e^*) follows using the exact same martingale argument as in Claim 3. 



